-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor File.ReadAllBytes* improvements #61519
Conversation
…case bug for files: Array.MaxLength < Length < int.MaxValue
…an additional sys call on Unix and every OS is able to recognize the disk access patterns anyway
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsChanges:
string path = "here.dat";
using (FileStream fs = File.Create(path))
{
fs.SetLength(Array.MaxLength + 1L);
}
File.ReadAllBytes(path); PS D:\projects\repros\maxArraySize> dotnet run -c Release
Out of memory.
|
Just curious, where does the 2GB file limit come from? Do we have to put it in a |
There's 0% benefit on any Windows system with any file system? I question whether that's actually the case. Should we simply change the implementation to only specify this on Windows, since there it's (I believe, please correct me if I'm wrong) free whereas we need to make an additional syscall on Unix and thus it really needs to be worth its weight? |
fyi, Linux doesn't recognize much. Use of It makes sense to leave it out for |
Co-authored-by: Dan Moseley <danmose@microsoft.com>
According to https://devblogs.microsoft.com/oldnewthing/20120120-00/?p=8493 (which might be outdated as it was referring to Windows 7):
But here for 99.99% of the cases we perform a single read (the size of the buffer passed to the OS is equal to the file size). I would expect it to be smart enough to recognize the pattern.
From the benchmarks that I've run (Win 11 + SSD + NTFS + BitLocker) I see no perf gains. I can add it for Windows if you think it's worth it just in case it's beneficial for some other configs (older Windows? FAT32?). |
Afaik because both Array and Span indexers use
In this case not, as the public contract states that this method returns |
src/libraries/System.IO.FileSystem/tests/File/ReadWriteAllBytesAsync.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/tests/File/ReadWriteAllBytesAsync.cs
Outdated
Show resolved
Hide resolved
Should we make SequentialScan then in general a nop on Linux / Unix? |
|
||
#if DEBUG | ||
fileLength = 0; // improve the test coverage for ReadAllBytesUnknownLength | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead just ensure we have tests that read files that return 0 for the length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead just ensure we have tests that read files that return 0 for the length?
We have some tests, and you are the person who authored most of them: dotnet/corefx#28388
The problem is that they don't guarantee that all possible code paths are going to be executed (using stack-allocated array, renting an array from the pool, returning it and renting a larger array etc) as they deal with a virtual file system that seems to be exposing files only for reading.
We could use pipes (#58434 introduced some tests for that ), but it would require more work and handling all edge cases.
I know that setting fileLength
to 0
with #if DEBUG
is ugly, but it gives us what we need with a very little effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it gives us what we need with a very little effort.
Does it? It means, for example, that Debug.Asserts in the normal non-zero code paths are now rendered useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means, for example, that Debug.Asserts in the normal non-zero code paths are now rendered useless.
Which asserts do you mean? I can't see any in the code path for regular files that report length properly:
runtime/src/libraries/System.Private.CoreLib/src/System/IO/File.cs
Lines 548 to 567 in 2576390
private static async Task<byte[]> InternalReadAllBytesAsync(SafeFileHandle sfh, int count, CancellationToken cancellationToken) | |
{ | |
using (sfh) | |
{ | |
int index = 0; | |
byte[] bytes = new byte[count]; | |
do | |
{ | |
int n = await RandomAccess.ReadAtOffsetAsync(sfh, bytes.AsMemory(index), index, cancellationToken).ConfigureAwait(false); | |
if (n == 0) | |
{ | |
ThrowHelper.ThrowEndOfFileException(); | |
} | |
index += n; | |
} while (index < count); | |
return bytes; | |
} | |
} |
runtime/src/libraries/System.Private.CoreLib/src/System/IO/File.cs
Lines 275 to 289 in 2576390
int index = 0; | |
int count = (int)fileLength; | |
byte[] bytes = new byte[count]; | |
while (count > 0) | |
{ | |
int n = RandomAccess.ReadAtOffset(sfh, bytes.AsSpan(index, count), index); | |
if (n == 0) | |
{ | |
ThrowHelper.ThrowEndOfFileException(); | |
} | |
index += n; | |
count -= n; | |
} | |
return bytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which asserts do you mean?
Any on any code paths used from here all the way down through the runtime. The change says that the 99.9% code path is never executed in debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any on any code paths used from here all the way down through the runtime.
These paths (namely RandomAccess.ReadAtOffsetAsync
and RandomAccess.ReadAtOffset
) have an excellent test code coverage (directly as RandomAccess
tests and indirectly via FileStream
tests) so I am not worried about it.
I do understand that it's not a clear win and a a trade off.
// SequentialScan is a perf hint that requires extra sys-call on non-Windows OSes. | ||
FileOptions options = (OperatingSystem.IsWindows() ? FileOptions.SequentialScan : FileOptions.None) | FileOptions.Asynchronous; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
// SequentialScan is a perf hint that requires extra sys-call on non-Windows OSes. | |
FileOptions options = (OperatingSystem.IsWindows() ? FileOptions.SequentialScan : FileOptions.None) | FileOptions.Asynchronous; | |
FileOptions options = FileOptions.Asynchronous; | |
if (OperatingSystem.IsWindows()) | |
{ | |
// SequentialScan is a perf hint that requires extra sys-call on non-Windows OSes. | |
options |= FileOptions.SequentialScan; | |
} |
?
Or as this is corelib, it could also be e.g.
// SequentialScan is a perf hint that requires extra sys-call on non-Windows OSes. | |
FileOptions options = (OperatingSystem.IsWindows() ? FileOptions.SequentialScan : FileOptions.None) | FileOptions.Asynchronous; | |
const FileOptions Options = | |
#if TARGET_WINDOWS | |
FileOptions.Asynchronous | FileOptions.SequentialScan; | |
#else | |
// SequentialScan is a perf hint that requires extra sys-call on non-Windows OSes. | |
FileOptions.Asynchronous; | |
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced, as the proposals require too much space. I've moved FileOptions.Asynchronous
to the left, hope it helps:
- FileOptions options = (OperatingSystem.IsWindows() ? FileOptions.SequentialScan : FileOptions.None) | FileOptions.Asynchronous;
+ FileOptions options = FileOptions.Asynchronous | (OperatingSystem.IsWindows() ? FileOptions.SequentialScan : FileOptions.None);
|
||
#if DEBUG | ||
fileLength = 0; // improve the test coverage for InternalReadAllBytesUnknownLengthAsync | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question/comment as earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Supposedly the larger read-ahead is going to deliver performance in some cases. I think we should not use it by default on Unix, and still allow the user to opt-in when he creates the @adamsitnik can you also update the private |
// bufferSize == 1 used to avoid unnecessary buffer in FileStream | ||
using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize: 1, FileOptions.SequentialScan)) | ||
// SequentialScan is a perf hint that requires extra sys-call on non-Windows OSes. | ||
FileOptions options = OperatingSystem.IsWindows() ? FileOptions.SequentialScan : FileOptions.None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe turn this into a static property, like PlatformSequentialScanOption
.
Changes:
FileStream
toRandomAccess
, reduces the allocations by +- 100 bytesFileOptions.SequentialScan
as it requires an additional sys-call on Unix. Modern kernels are really good at recognizing file access patterns, we don't get anything from this hint. Reading small files (< 1kb) on Unix is 8% faster.Array.MaxLength
insteadint.MaxSize
for determining the limit. It fixes an edge case bug where for files of size> Array.MaxLength
but< int.MaxSize
where it was throwingOOM
insteadIOException
:PS D:\projects\repros\maxArraySize> dotnet run -c Release Out of memory.
returningInternalTask
-related logicfileLength = 0
for DEBUG builds